Robust Speech Recognition Based on Multi Stream Features
نویسنده
چکیده
In this paper we discuss a new automatic speech recognition ASR approach based on the inde pendent processing and recombination of several feature streams In this framework it is assumed that the speech signal is represented in terms of multiple input streams each input stream representing a di erent characteristic of the signal If the streams are entirely synchronous they may be accommodated simply However as discussed in the paper it may be required to permit some degree of asynchrony between streams which are then forced to re combine at some temporal anchor points associated with some pre de ned speech unit levels We start by intro ducing the basic framework of a statistical structure that can accommodate multiple observation streams This ap proach was initially applied to the case of subband based speech recognition and was shown to yield signi cantly bet ter noise robustness After having summarized these res ults the multi stream approach will be used to combine multiple time scale features in ASR systems in our case to use syllable level features in a phoneme based HMM
منابع مشابه
An Information-Theoretic Discussion of Convolutional Bottleneck Features for Robust Speech Recognition
Convolutional Neural Networks (CNNs) have been shown their performance in speech recognition systems for extracting features, and also acoustic modeling. In addition, CNNs have been used for robust speech recognition and competitive results have been reported. Convolutive Bottleneck Network (CBN) is a kind of CNNs which has a bottleneck layer among its fully connected layers. The bottleneck fea...
متن کاملDBN based multi-stream models for speech
We propose dynamic Bayesian network (DBN) based synchronous and asynchronous multi-stream models for noise-robust automatic speech recognition. In these models, multiple noise-robust features are combined into a single DBN to obtain better performance than any single feature system alone. Results on the Aurora 2.0 noisy speech task show significant improvements of our synchronous model over bot...
متن کاملMulti-Stream Front-End Processing for Robust Distributed Speech Recognition
This paper investigates a multi-stream-based front-end in Distributed Speech Recognition (DSR). It aims at improving the performance of Hidden Markov Model (HMM)-based systems by combining features based on conventional MFCCs and formant-like features to constitute a new multivariate feature vector. The approach presented in this paper constitutes an alternative to the DSR-XAFE (XAFE: eXtended ...
متن کاملComparative experiments to evaluate the use of auditory-based acoustic distinctive features and formant cues for robust automatic speech recognition in low-SNR car environments
This paper presents an evaluation of the use of some auditorybased distinctive features and formant cues for robust automatic speech recognition (ASR) in the presence of highly interfering car noise. Comparative experiments have indicated that combining the classical MFCCs with some auditory-based acoustic distinctive cues and either the main formant magnitudes or the formant frequencies of a s...
متن کاملMulti-stream spectro-temporal features for robust speech recognition
A multi-stream approach to utilizing the inherently large number of spectro-temporal features for speech recognition is investigated in this study. Instead of reducing the featurespace dimension, this method divides the features into streams so that each represents a patch of information in the spectrotemporal response field. When used in combination with MFCCs for speech recognition under both...
متن کاملAuditory-based Acoustic Distinctive Features and Spectral Cues for Robust Automatic Speech Recognition in Low-SNR Car Environments
In this paper, a multi-stream paradigm is proposed to improve the performance of automatic speech recognition (ASR) systems in the presence of highly interfering car noise. It was found that combining the classical MFCCs with some auditory-based acoustic distinctive cues and the main formant frequencies of a speech signal using a multi-stream paradigm leads to an improvement in the recognition ...
متن کامل